41 research outputs found
Recommended from our members
Robust Machine Learning by Integrating Context
Intelligent software has the potential to transform our society. It is becoming the building block for many systems in the real world. However, despite the excellent performance of machine learning models on benchmarks, state-of-the-art methods like neural networks often fail once they encounter realistic settings. Since neural networks often learn correlations without reasoning with the right signals and knowledge, they fail when facing shifting distributions, unforeseen corruptions, and worst-case scenarios. Since neural networks are black-box models, they are not interpretable or trusted by the user. We need to build robust models for machine learning to be confidently and responsibly deployed in the most critical applications and systems.
In this dissertation, I introduce our robust machine learning systems advancements by tightly integrating context into algorithms. The context has two aspects: the intrinsic structure of natural data, and the extrinsic structure from domain knowledge. Both are crucial: By capitalizing on the intrinsic structure in natural data, my work has shown that we can create robust machine learning systems, even in the worst case, an analytical result that also enjoys strong empirical gains.
Through integrating external knowledge, such as the association between tasks and causal structure, my framework can instruct models to use the right signals for inference, enabling new opportunities for controllable and interpretable models.
This thesis consists of three parts. In the first part, I aim to cover three works that use the intrinsic structure as a constraint to achieve robust inference. I present our framework that performs test-time optimization to respect the natural constraint, which is captured by self-supervised tasks. I illustrate that test-time optimization improves out-of-distribution generalization and adversarial robustness. Besides the inference algorithm, I show that intrinsic structure through discrete representations also improves out-of-distribution robustness.
In the second part of the thesis, I then detail my work using external domain knowledge. I first introduce using causal structure from external domain knowledge to improve domain generalization robustness. I then show how the association of multiple tasks and regularization objectives helps robustness.
In the final part of this dissertation, I show three works on trustworthy and reliable foundation models, a general-purpose model that will be the foundation for many AI applications. I show a framework that uses context to secure, interpret, and control foundation models
Convolutional Visual Prompt for Robust Visual Perception
Vision models are often vulnerable to out-of-distribution (OOD) samples
without adapting. While visual prompts offer a lightweight method of
input-space adaptation for large-scale vision models, they rely on a
high-dimensional additive vector and labeled data. This leads to overfitting
when adapting models in a self-supervised test-time setting without labels. We
introduce convolutional visual prompts (CVP) for label-free test-time
adaptation for robust visual perception. The structured nature of CVP demands
fewer trainable parameters, less than 1\% compared to standard visual prompts,
combating overfitting. Extensive experiments and analysis on a wide variety of
OOD visual perception tasks show that our approach is effective, improving
robustness by up to 5.87% over several large-scale models
Robustifying Language Models with Test-Time Adaptation
Large-scale language models achieved state-of-the-art performance over a
number of language tasks. However, they fail on adversarial language examples,
which are sentences optimized to fool the language models but with similar
semantic meanings for humans. While prior work focuses on making the language
model robust at training time, retraining for robustness is often unrealistic
for large-scale foundation models. Instead, we propose to make the language
models robust at test time. By dynamically adapting the input sentence with
predictions from masked words, we show that we can reverse many language
adversarial attacks. Since our approach does not require any training, it works
for novel tasks at test time and can adapt to novel adversarial corruptions.
Visualizations and empirical results on two popular sentence classification
datasets demonstrate that our method can repair adversarial language attacks
over 65% oComment: 8 Pages 2 Figures Submitted to ICLR Worksho
1st ICLR International Workshop on Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data (PAIR^2Struct)
Recent years have seen advances on principles and guidance relating to
accountable and ethical use of artificial intelligence (AI) spring up around
the globe. Specifically, Data Privacy, Accountability, Interpretability,
Robustness, and Reasoning have been broadly recognized as fundamental
principles of using machine learning (ML) technologies on decision-critical
and/or privacy-sensitive applications. On the other hand, in tremendous
real-world applications, data itself can be well represented as various
structured formalisms, such as graph-structured data (e.g., networks),
grid-structured data (e.g., images), sequential data (e.g., text), etc. By
exploiting the inherently structured knowledge, one can design plausible
approaches to identify and use more relevant variables to make reliable
decisions, thereby facilitating real-world deployments
Robust Perception through Equivariance
Deep networks for computer vision are not reliable when they encounter
adversarial examples. In this paper, we introduce a framework that uses the
dense intrinsic constraints in natural images to robustify inference. By
introducing constraints at inference time, we can shift the burden of
robustness from training to the inference algorithm, thereby allowing the model
to adjust dynamically to each individual image's unique and potentially novel
characteristics at inference time. Among different constraints, we find that
equivariance-based constraints are most effective, because they allow dense
constraints in the feature space without overly constraining the representation
at a fine-grained level. Our theoretical results validate the importance of
having such dense constraints at inference time. Our empirical experiments show
that restoring feature equivariance at inference time defends against
worst-case adversarial perturbations. The method obtains improved adversarial
robustness on four datasets (ImageNet, Cityscapes, PASCAL VOC, and MS-COCO) on
image recognition, semantic segmentation, and instance segmentation tasks.
Project page is available at equi4robust.cs.columbia.edu
Bidirectional Inference Networks: A Class of Deep Bayesian Networks for Health Profiling
We consider the problem of inferring the values of an arbitrary set of
variables (e.g., risk of diseases) given other observed variables (e.g.,
symptoms and diagnosed diseases) and high-dimensional signals (e.g., MRI images
or EEG). This is a common problem in healthcare since variables of interest
often differ for different patients. Existing methods including Bayesian
networks and structured prediction either do not incorporate high-dimensional
signals or fail to model conditional dependencies among variables. To address
these issues, we propose bidirectional inference networks (BIN), which stich
together multiple probabilistic neural networks, each modeling a conditional
dependency. Predictions are then made via iteratively updating variables using
backpropagation (BP) to maximize corresponding posterior probability.
Furthermore, we extend BIN to composite BIN (CBIN), which involves the
iterative prediction process in the training stage and improves both accuracy
and computational efficiency by adaptively smoothing the optimization
landscape. Experiments on synthetic and real-world datasets (a sleep study and
a dermatology dataset) show that CBIN is a single model that can achieve
state-of-the-art performance and obtain better accuracy in most inference tasks
than multiple models each specifically trained for a different task.Comment: Appeared at AAAI 201
Doubly Right Object Recognition: A Why Prompt for Visual Rationales
Many visual recognition models are evaluated only on their classification
accuracy, a metric for which they obtain strong performance. In this paper, we
investigate whether computer vision models can also provide correct rationales
for their predictions. We propose a ``doubly right'' object recognition
benchmark, where the metric requires the model to simultaneously produce both
the right labels as well as the right rationales. We find that state-of-the-art
visual models, such as CLIP, often provide incorrect rationales for their
categorical predictions. However, by transferring the rationales from language
models into visual representations through a tailored dataset, we show that we
can learn a ``why prompt,'' which adapts large visual representations to
produce correct rationales. Visualizations and empirical experiments show that
our prompts significantly improve performance on doubly right object
recognition, in addition to zero-shot transfer to unseen tasks and datasets